SELECT table_id, status FROM sys.fulltext_index_fragments WHERE status=4 OR status=6;
ALTER FULLTEXT CATALOG AdventureWorks2008 REORGANIZE;
Full-Text Search Performance
SQL Server FTS performance
is most sensitive to the number of rows in the result set and number of
search terms in the query. You should limit your result set to a
practical number; most searchers are conditioned to look only at the
first page of results for what they are looking for, and if they don’t
see what they need there, they refine the search and search again. A
good practical limit for the number of rows to return is 200. You should
try, if at all possible, to use simple queries because they perform
better than more complex ones. As a rule, you should use CONTAINS rather than FREETEXT because it offers better performance, and you should use CONTAINSTABLE rather than FREETEXTTABLE for the same reason.
Several factors are involved in delivering an optimal Full-Text Search solution. Consider the following:
Avoid indexing binary content. Convert it to text, if possible. Most IFilters do not perform as well as the text IFilter.
Use integer columns on the base table that comprises your unique index.
Partition
large tables into smaller tables. There seems to be a sweet spot around
50 million rows, but your results may vary. Ensure that for large
tables, each table has its own catalog. Place this catalog on a RAID 10
array, preferably on its own controller.
SQL
Full-Text Search benefits from multiple processors, preferably four or
more. A sweet spot exists on eight-way machines or better. You will find
64-bit hardware also offers substantial performance benefits over
32-bit.
Dedicate at least 512MB to 1GB of RAM to MSFTESQL by setting the maximum server memory to 1GB less than the installed memory. Set resource usage to run at 5 to give a performance boost to the indexing process (that is, sp_fulltext_service 'resource_usage',5), set ft crawl bandwidth (max) and ft notify bandwidth (max) to 0, and set max full-text crawl range to the number of CPUs on your system. Use sp_configure to make these changes.
Full-Text Search Troubleshooting
The first question you should
ask yourself when you have a problem with SQL Full-Text Search is this:
“Is the problem with searching or with indexing?” To help you make this
determination, Microsoft has included three DMVs in SQL Server 2008:
The first two DMVs displays the contents of your full-text index. The first DMV returns the following columns:
Keyword— Each keyword in varbinary form.
Display_term— The keyword as indexed; all the accents are removed from the word.
Column_ID— The column ID where the word exists.
Document_Count— The number of times the word exists in that column.
The second DMV breaks down the keywords by document. Like the first DMV, it contains the Keyword, Display_term, and Column_ID columns, but in addition it contains the following two columns:
Document_ID— The row in which the keyword occurs.
Occurrence_count—
The number of times the word occurs in the cell (a cell is also known
as a tuple; it is a row-column combination—for example, the contents of
the third column in the fifth row).
The first DMV, sys.dm_fts_index_keywords, is used primarily to determine candidate noise wordsit can be used to diagnose indexing problems. The second DMV, sys.dm_fts_index_keywords_by_document, is used to determine what is stored in your index for a particular cell.
Here are some examples of their usage:
select * From sys.dm_fts_index_keywords(DB_ID(),Object_iD('MyTable'))
select * From sys.dm_fts_index_keywords_by_document(DB_ID(),Object_iD('MyTable'))
These two DMVs are used to determine what occurs at index time. The third DMV, sys.dm_fts_parser,
is used primarily to determine what happens at search time—in other
words, how SQL Server Full-Text Search interprets your search phrase.
Here is an example of its usage.
select * from sys.dm_fts_parser(@queryString, @LCID, @StopListID, @AccentSenstive)
@QueryString is your search word or phrase, @LCID is the LoCale ID for your language
(determinable by querying sys.fulltext_languages), @StopListID is your stoplist
file (determinable by querying sys.fulltext_stoplists), @AccentSensitive allows you
to set accent sensitivity (0 not sensitive, 1 sensitive to accents) . Here is an
example of how this works:
select * from sys.dm_fts_parser('café', 1033, 0, 1)
select * from sys.dm_fts_parser('café', 1033, 0, 0)
In the second example, you will notice that the Display_term is cafe and not café. These queries return the following columns:
Keyword— This is a varbinary representation of your keyword.
Group_id—
The query parser builds a parse tree of the search phrase. If you have
any Boolean searches, it assigns different group IDs to each part of the
search term. For example in the search phrase '"Hillary Clinton" OR "Barack Obama"', Hillary and Clinton belong to Group ID 1 and Barack and Obama2. belong to Group ID
Phrase_id—
Some words are indexed in multiple forms; for example, data-base is
indexed as data, base, and database. In this case, data and base have
the same phrase ID, and database has another phrase ID.
Occurence_count— This is how frequently the word apprears in the search string.
Special_term— This column refers to any delimiters that the parser finds in the search phrase. Possible values are Exact Match, End of Sentence, End of Paragraph, and End of Chapter.
Display_term— This is how the term would be stored in the index.
Expansion_type— This is the type of expansion, whether it is a thesaurus expansion (4), an inflectional expansion (2), or not expanded (0). For example, the following query shows the stemmed variants of the word run.
select * from sys.dm_fts_parser('FORMSOF( INFLECTIONAL, run)', 1033, 0, 0)
Source_Term— This is the source term as it appears in your query.
When troubleshooting indexing problems, you should consult the full-text error log, which can be found in C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\LOG and starts with the prefix SQLFT followed by the database ID (padded with leading zeros), the catalog ID (query sys.fulltext_catalogs for this value), and then the extension .log. You may find many versions of the log each with a numerical extension, such as SQLFT0001800005.LOG.4; this is the fourth version of this log. These full-text indexing logs can be read by any text editor.
You might find entries in this
log that indicate documents were retried or documents failed indexing
in addition to error messages returned from the iFilters.